The title of this chapter reflects the three rules of system administration: 1) Backup;
2) Backup; and 3) Backup! Although this advice may sound trite, the number of people who
have lost important or valuable data, not to mention all the configuration information
they spend days getting correct, is enormous. Even if you don't have a tape drive or other
backup storage device, get in the habit of backing up the most important pieces of
information. This chapter looks at how to properly back up information.
If you run a system that has many users, network access, e-mail, and so on, backups are
a very important aspect of the daily routine. If your system is used for your own pleasure
and is not used for any important files, backups are not as important except as a way to
recover your configuration and setup information. You should make backups either way; the
difference is the regularity with which you make them.
A backup is a copy of the filesystem or files on part of a filesystem stored onto
another medium that can be used later to recreate the original. In most UNIX systems, the
medium used for backups is tape, but you can also use floppy disks or secondary and
removable hard disks.
So many potential sources of damage to a modern computer system exist that they can be
overwhelming. Damage to your hard disks and their filesystems and data can occur from
hardware failures, power interruptions, or badly typed commands. Part of the potential for
damage with Linux is the nature of an operating system itself. Because Linux is a
multiuser and multitasking operating system, many system files are open at any moment. At
most millisecond increments, data is being written to or read from a hard disk (even when
the system has no users or user-started background processes on it). Also, Linux maintains
a lot of information in memory about its current state and the state of the filesystems.
This information must be written to disk frequently. When CPU processes are interrupted,
system files and tables can be lost from memory. Disk files can be left in a temporary
state that doesn't match the real filesystem status.
Although damage to a filesystem can occur from many sources, not all of which are under
the control of the system administrator, it is the administrator's task to make sure the
system can be restored to a working state as quickly as possible. Having a backup is
sometimes your only chance of getting back lost information. Although the process of
making backups can be tiresome and time-consuming, this inconvenience is often outweighed
by the time required to recoup any lost information in case of problems. With utilities
like cron available, the task of backing up is much easier, too.
One final aspect about backups you need to consider is where to keep the backup media
after it has been used. For most home users, the only option is to store the tapes,
drives, floppy disks, or other media in the same place as the Linux machine. Make sure the
location is away from magnetic fields (including telephones, modems, televisions,
speakers, and so on). For systems that are used for more than pleasure, consider keeping
copies away from the main machine, preferably away from the same physical location. This
type of off-site backup enables you to recover in case of a catastrophe, such as a fire,
that destroys your system and backup media library.
By far the most commonly used medium for backups is tape, especially tape cartridges.
Tape is favored because it has a low cost, a relatively easy storage requirement, and
reasonable speed. The process of writing and reading data from a tape is reliable, and
tapes are portable from machine to machine. All you need, of course, is a tape drive. If
you don't have one, you need to find another usable medium for backups.
Possible alternative media include removable hard disks of many different types, such
as the Iomega Bernoulli or ZIP drives. These cartridges use magnetic head technology just
like a normal hard drive. You can remove these disk-platter systems, which usually come in
a protective cartridge, from the main system and store them elsewhere. You can then cycle
through several of these disks as you would with tapes. In some cases, removable
cartridges are available for a competitive price compared to tape cartridges, although
some high-capacity removable cartridges cost more (but also offer more storage). The cost
of the removable cartridge drive varies depending on the capacity, manufacturer, and
technology, but it is also competitive with a tape drive in many cases.
Several new magneto-optical cartidge systems for DOS and Windows are usable under
Linux, too. These systems tend to be small 3.5-inch cartridge systems that fit into a
small drive unit. A 230M magneto-optical cartridge and drive can cost less than some tape
drives, and they present a more secure backup medium because magneto-optical systems are
not susceptible to magnetic fields. They have a potentially longer life, too.
Large-capacity magneto-optical systems, now approaching 2.4G, are currently available,
although they tend to cost as much as a new computer.
Another possibility is another hard disk. With the price of hard disks dropping all the
time, you can add another hard disk just for backups to your system (or any other system
connected by a network) and use it as a full backup.
The popularity of writable CD-ROM and WORM (write once, read many) drives makes them a
possibilty as well, although you must bear in mind that this type of media can only be
written to once (the disks can't be reused). This type of media does have an advantage for
archival purposes where you may need to prove certain file dates are accurate. CDs are
also useful for permanent storage of important files like accounting records, personal
letters, documents such as wills, and binaries. CD-ROM discs can hold 750M of data,
although most consumer discs are designed for 650M.
Consider a floppy disk drive as a last resort backup device for large filesystems,
although it is very good for backing up small files. High-capacity floppy disk drives are
beginning to appear now, but the lack of Linux drivers make them unusable for most backup
situations.
One of the most important aspects of making backups is to make them regularly.
Regularity is much more important for systems that support many users and have constantly
changing filesystems. If your Linux machine is used only for your own purposes, you can
make backups whenever you feel there is material that should be backed up.
For most systems with a few users, constant Internet access for e-mail or newsgroups,
and similar daily changes to the filesystem, a daily backup schedule is important. You
don't have to make a full backup of everything on your hard drives every day, but you
should consider using incremental backups, which copy only those files that are new or
have changed since the last backup.
Most UNIX system administrators prefer to perform backups during the night or early
hours of the morning because few users are logged in, there is no real load on the CPU,
and the system has the least number of open files at this time. Because backups are easily
automated using cron (see Chapter 23, "The cron and at
Programs"), you can set the exact backup time to minimize the impact on any other
background processing tasks that the system may be running. Because you don't have to
manually start the backup process, you can do it at any time. All the system administrator
has to do in this kind of backup schedule is check that the backup was completed properly,
change the backup media, and log the backup.
For those systems with a single user and a lightly loaded Linux system, backups can be
done practically anytime, although it is a good idea to have the backups performed
automatically if your system is on all the time. If your Linux system is only active when
you want to use it, get in the habit of making a backup while you do other tasks on the
system.
When DOS or Windows users move to UNIX, they sometimes have the bad habit of keeping a
single tape (or other media) and continually recycling that one unit every time they make
a backup. It is foolhardy to keep only one backup copy of a system as this prevents you
from moving back to previous backups. For example, suppose you deleted a file a week ago
and had it safely stored on a backup tape at that time. When you reuse the backup tape,
the old contents are erased and you can never get the old file back.
Ideally, you should keep backup copies for days, or even weeks, before reusing them. On
systems with several users, this habit is even more important because users only remember
that they need a file they deleted two months ago after you have recycled the tape a few
times. Some backup scheduling methods can help get around this problem, as you will see in
a moment. The ideal backup routine varies depending on the system administrator's ideas
about backups, but a comprehensive backup system requires at least two weeks of daily
incremental backups and a full backup every week.
A full backup is a complete image of everything on the filesystem, including all files.
The backup media required for full backups is usually close to the total size of your
filesystem. For example, if you have 150M used in your filesystem, you need about 150M of
tape or other media for a backup. With compression algorithms, some backup systems can get
the requirements much lower, but compression is not always available. Also, you may need
several volumes of media for a single full backup, depending on the capacity of the backup
unit. If your tape drive can only store 80M on a cartridge and you have to backup 150M,
you need two tapes in sequence for the one backup. Because the Linux system's cron utility
can't change tapes automatically, full backups over several volumes require some operator
interaction. Obviously, making a full system backup on low-capacity media (like floppy
disks) is a long, tedious process because there are many volumes that must be switched.
Incremental backups (sometimes called differential backups) back up only the files that
have been changed or created since the last backup. Unlike DOS, Linux doesn't have a file
indicator that shows what files have been backed up. However, you can use the modification
date to effectively act like a backup indicator.
Incremental backups are sometimes difficult to make with Linux unless you restrict
yourself to particular areas of the filesystem that are likely to have changed. For
example, if your users are all in the /usr directory, you can backup only that filesystem
area instead of the entire filesystem. This kind of backup is often called a partial
backup, as only a part of the filesystem is saved. (Incremental backups can be made under
any operating system by using a background process that logs all changes of files to a
master list, and then uses the master list to create backups. Creating such a scheme is
seldom worth the effort, though.)
How often should you back up your system? The usual rule is to back up whenever you
can't afford to lose information. For many people, this criteria means daily backups.
Imagine that you have been writing a document or program, and you lose all the work since
the last backup. How long will it take to rewrite (if at all possible)? If the rewriting
of the loss is more trouble than the time required to perform a backup, make a backup!
So how can you effectively schedule backups for your system, assuming you want to save
your contents regularly? Assuming that your system has several users (friends calling in
by modem or family members who use it) and a reasonable volume of changes (e-mail,
newsgroups, word processing files, databases, or applications you are writing, for
example), consider daily backups. The most common backup schedule for a small,
medium-volume system requires between 10 and 14 tapes, depending on whether backups are
performed on weekends. (The rest of this section uses tapes as the backup medium, but you
can substitute any other device that you want.)
Label all backup tapes with names that reflect their use. For example, label your tapes
Daily 1, Daily 2, and so on up to the total number of daily use tapes, such as Daily 10.
Cycle through these daily use tapes, restarting the cycle after you have used all the
tapes (so that Daily 1 follows after Daily 10). With this many tapes, you have a two week
supply of backups (ignoring weekend backups, in this case), enabling you to recover
anything going back two weeks. If you have more tapes available, use them to extend the
backup cycle.
The backups can be either full or partial, depending on your needs. A good practice is
to make one full backup for every four or five partial. You can make a full backup of your
entire filesystem on Mondays, for instance, but only back up the /usr directories the
other days of the week. Make an exception to this process if you make changes to the Linux
configuration so that you have the changes captured with a full backup. You can keep track
of the backups using a backup log, which is covered in the next section.
An expansion of this daily backup scheme that many administrators (including the
author) prefer is the daily and weekly backup cycle. This backup system breaks up the
number of tapes into daily and weekly use. For example, if you have 14 tapes, use 10 for a
daily cycle as already mentioned. You can still call these tapes Daily 1 through Daily 10.
Use the other four tapes in a biweekly cycle and name them Week 1, Week 2, Week 3, and
Week 4.
To use this backup system, perform your daily backups as already mentioned, but use the
next weekly tape when you get to the end of the daily cycle. Then you cycle through the
daily tapes again, followed by the next weekly tape. (Your backup cycle is Daily 1 through
Daily 10, Week 1, Daily 1 through Daily 10, Week 2, and so on.)
This backup cycle has one major advantage over a simple daily cycle. When the entire
cycle is underway, there are 10 daily backups, which cover a two-week period. The biweekly
tapes extend back over four complete daily cycles, or eight weeks. You can then recover a
file or group of files from the filesystem as it was two months ago, instead of just two
weeks. This backup method gives you a lot more flexibility in recovering information that
was not noticed as missing or corrupt right away. If even more tapes are available, you
can extend either the daily or biweekly cycle, or add monthly backups.
Many system administrators begin their careers by making regular backups, as they
should. However, when they get to the point where they have to restore a file from a
backup tape, they have no idea which tapes include the file or which tapes were used on
what days. Some system administrators get by this problem by placing a piece of paper or
stick note on each tape with the date and contents on it. This solution means you have to
flip through the tapes to find the one you want, though, which can be awkward when you
have lots of tapes. For this reason, you should keep a backup log. (A log is a good idea
for backups on other operating systems as well.)
Whenever you make a backup, you should update the backup log. A backup log doesn't have
to be anything complex or elaborate. You can use the back of a notebook with a couple of
vertical columns drawn in, use a form on the computer itself (which you should print out
regularly, of course), or keep a loose-leaf binder with a few printed forms in it. A
typical backup log needs the following information:
You can record these four bits of information in a few seconds. For larger systems, you
can add a few other pieces of information to complete a full backup record:
The dates of the backup help you keep track of when the last backup was performed and
also act as an index for file recovery. If one of your system users knows they deleted a
file by accident a week ago, you can determine the proper backup tape for the file
restoration from the backup log dates.
For convenience, keep the backup log near the system. Some administrators prefer to
keep the log in the same location as the backup media storage instead. Some system
administrators also keep a duplicate copy of the backup log in another site, just in case
of catastrophe. Do what is appropriate for your system.
The tar (tape archiver) program is usually the command you use to save files and
directories to an archive medium and recover them later. The tar command works by creating
an archive file, which is a single large entity that holds many files within it (much like
PKZIP does in DOS, for example). The tar command only works with archives it creates.
The format of the command is a little awkward and takes some getting used to, but
fortunately most users only need a few variations of the commad. The format of the tar
command is as follows:
tar switch modifiers files
The files section of the command indicates which files or directories you want to
archive or restore. You probably want to archive a full filesystem such as /usr. In the
case of recovery, you may want a single file such as /usr/tparker/big_file.
The switch controls how tar reads or writes to the backup media. You can use only one
switch with tar at a time. The valid switches are as follows:
| c | Creates a new archive media |
| r | Writes to end of existing archive |
| t | Lists names of files in an archive |
| u | Adds files that are not already modified or archived |
| x | Extracts from the archive |
You can add a number of modifiers to the tar command to control the archive and how tar
uses it. Valid modifiers include the following:
| A | Suppresses absolute filenames |
| b | Provides a blocking factor (1-20) |
| e | Prevents splitting files across volumes |
| f | Specifies the archive media device name |
| F | Specifies the name of a file for tar arguments |
| k | Gives size of archive volume (in kilobytes) |
| l | Displays error messages if links are unresolved |
| m | Does not restore modification times |
| n | Indicates the archive is not a tape |
| p | Extracts files with their original permissions |
| v | Provides verbose output (lists files on the console) |
| w | Displays archive action and waits for user confirmation |
The tar command uses absolute pathnames for most actions, unless you specify the A
modifier.
A few examples may help explain the tar command and how to use tar switches. If you are
using a tape drive called /dev/tape and the entire filesystem to be archived totals less
than the tape's capacity, you can create the tape archive with the
following command:
tar cf /dev/tape /
The f option enables you to specify the device name, /dev/tape in this case. The entire
root filesystem is archived in a new archive file (indicated by the c). Any existing
contents on the tape are automatically overwritten when the new archive is created. (You
are not asked whether you are sure you want to delete the existing contents of the tape,
so make sure you are overwriting material you don't need.) If you include the v option in
the command, tar would echo the filenames and their sizes to the console as they are
archived.
If you need to restore the entire filesystem from the tape used in the preceding
example, issue the command:
tar xf /dev/tape
This command restores all files on the tape because no specific directory has been
indicated for recovery. The default, when no file or directory is specified, is the entire
tape archive. If you want to restore a single file from the tape, use the command
tar xf /dev/tape /usr/tparker/big_file
which restores only the file /usr/tparker/big_file.
Sometimes you may want to obtain a list of all files on a tape archive. You can do this
with the following command:
tar tvf /dev/tape
This command uses the v option to display the results from tar. If the list is long,
you may want to redirect the command to a file.
Most tapes require a blocking factor when creating an archive, but you don't need to
specify a blocking factor when reading a tape because tar can figure it out automatically.
The blocking factor tells tar how much data to write in a chunk on the tape. When
archiving to a tape, you specify the blocking factor with the b modifier. For example, the
command
tar cvfb /dev/tape 20 /usr
creates an new archive on /dev/tape that has a blocking factor of 20 and contains all
the files in /usr. Most tapes can use a blocking factor of 20, and you can assume this
factor as a default value unless your tape drive specifically won't work with this value.
The only times blocking factors are changed are for floppy disks and other hard disk
volumes. Note that the arguments following the modifiers are in the same order as the
modifiers. The f precedes the b modifier so the arguments have the device before the
blocking factor. The arguments must be in the same order as the modifiers, which can
sometimes cause a little confusion.
Another common problem is that a tape may not be large enough to hold the entire
archive, in which case more than one tape will be needed. To tell tar the size of each
tape, you need the k option. This option uses an argument that is the capacity in
kilobytes. For example, the command
tar cvbfk 20 /dev/tape 122880 /usr
tells tar to use a blocking factor of 20 for the device /dev/tape. The tape capacity is
122880 kilobytes (approximately 120 M). Again, note that the order of arguments matches
the order of the modifiers.
Floppy disks create another problem with tar, as the blocking factor is usually
different. When you use floppy disks, archives usually require more than one disk. You use
the k option to specify the archive volume's capacity. For example, to back up the
/usr/tparker directory to 1.2M floppy disks, the command would be
tar cnfk /dev/fd0 1200 /usr/tparker
where /dev/fd0 is the device name of the floppy drive and 1200 is the size of the disk
in kilobytes. The n modifier tells tar that this is not a tape. As a result, tar runs a
little more efficiently than if the modifier had been left off.
This chapter looked at the basics of backups. You should maintain a backup log and make
regular backups to protect your work. Although tar is a little awkward to use at first, it
soon becomes second nature. You can use the tar command in combination with compression
utilities such as compress. Alternatively, you can use utilities like gzip and gunzip that
combine both utilities into one program. Although this program may be more convenient, tar
is still the most widely used archive utility and is therefore worth knowing.
A number of scripts are beginning to appear that automate the backup process or give you a menu-driven interface to the backup system. These scripts are not in general distribution, but you may want to check FTP and BBS sites for a utility that simplifies backups for you.